Introduction

Column

Song Information

Welcome to my portfolio, where I explore two electronic tracks that have personally resonated with me: Nightfall Future and Lost in Dreams. Though I didn’t create them, I selected them for their calm and introspective moods. Nightfall Future has a moody, textured feel, while Lost in Dreams offers a lighter, dreamlike quality. Together, they reflect the emotional range that draws me to electronic music. This portfolio uses these tracks to explore how musical features and structure communicate mood, comparing them to the broader class corpus.

Lost in Dreams

“Lost in Dreams” is a chill house track with a dreamy, uplifting feel. It has mid-level arousal and valence, but slightly higher danceability. Its smooth melodies and steady rhythm create a light, pleasant mood. While it blends in with balanced tracks in the corpus, its floating character and warm tone make it stand out.

Nightfall Future

“Nightfall Future” leans into a calm, introspective sound with low arousal and valence. Despite this, its steady beat gives it moderate danceability. This balance of stillness and structure places it as a grounded, relaxed presence in the dataset.

Column

Visualisation corpus

This is a visualization of the whole class corpus based on the highest performing features, which are explained in detail in the ‘Classification’ tab. My selected songs, “Nightfall Future” and “Lost in Dreams,” are highlighted in blue and are both Non-AI. In the graph, the two tracks appear close to each other in feature space, suggesting they are similar. However, when listening to them, they give off very different feelings and styles. This portfolio explores those differences in more detail.

Pitch & Harmony

Column

Nightfall future

Pitch

Pitch is the perceived highness or lowness of a sound. It is based on the frequency of the sound wave: higher frequencies are heard as higher pitches, and lower frequencies as lower pitches.It will be visualized through the following graphs.

Chromagram

This chromagram shows how pitch classes are distributed over the full duration of “Nightfall Future.” Although no single pitch dominates the track completely, several tones appear frequently with moderate intensity. In particular, we observe strong and repeated activity around the notes A, D, and E.

These pitches suggest a connection to the A minor scale or its relative modes. The overall energy distribution remains subtle and evenly spread, reflecting the track’s ambient nature. The harmonic movement is fluid and non-intrusive, with soft transitions between tones rather than sharp shifts. This matches the sonic goal of the track, which aims to create an immersive and calming environment.

Additionally, the lack of strong tonal contrast or dominant chord roots reinforces the sense of openness and introspection in the piece. The chromagram highlights this by showing a broad but balanced use of notes, contributing to the song’s relaxed, meditative atmosphere.

Chordogram

A chordogram uses chroma features to group the notes into major, minor, or seventh chords, helping us understand the harmonic structure of the track. In “Nightfall Future,” we see that the chord usage is quite varied, without a clear or consistent tonal center.

This reflects the ambient and experimental nature of the track. There are many short-lasting chords and frequent shifts between keys, which suggest that harmony plays a more atmospheric than functional role. Some chords appear only briefly, while others return more often, creating a floating and unpredictable feel throughout the song.

Chroma similarity

This chroma-based self-similarity matrix reveals how harmonically similar each moment of Nightfall Future is to every other moment in the track.

The matrix shows a dense, intricate pattern with many small-scale diagonal lines, indicating local repetition and harmonic consistency. There are no large block-like segments, suggesting the track does not follow a traditional verse-chorus or drop-based structure.

Instead, Nightfall Future maintains a subtle but continuous harmonic progression, with smooth transitions rather than sharp contrasts. This aligns with the track’s introspective and ambient feel, where mood and texture evolve gradually over time.

The lack of clear sectional boundaries, combined with fine-grained internal repetition, highlights the track’s balance between harmonic variation and cohesion—supporting its relaxed but structurally stable identity within the dataset.

Column

Lost in Dreams

Chromagram

The chromagram of “Lost in Dreams” reveals a more defined pitch center compared to “Nightfall Future.” Notably, the notes C#, F#, and G# appear prominently throughout the piece, suggesting a tonal focus on C# minor or a related key. The clear presence of these tones across the timeline reflects harmonic stability and coherence.

While a few other pitch classes, such as E and B, also make frequent appearances, their intensity is more moderate, reinforcing their role as supporting tones rather than tonal anchors. This consistency in pitch activity supports the track’s smooth and dreamy feel, where melodic lines flow easily over a steady harmonic foundation.

Chordogram

The chordogram for “Lost in Dreams” shows how different chords appear throughout the track. Lighter areas indicate stronger matches with certain chord templates. From the image, we can see that several minor chords—especially C# minor and F# minor—are very active, suggesting that the track stays mostly within a minor key.

In the first half of the track, the chord progression is fairly stable. Chords like C# minor and A major appear consistently, giving the piece a strong tonal center. Around the middle, there’s a bit more variation, with short shifts to related keys and some seventh chords showing up more clearly. This creates a subtle sense of movement while still keeping a calm and cohesive mood.

In the last part, the pattern becomes slightly more repetitive again. The return of familiar chords reinforces the dream-like and relaxing feeling of the track, making it feel whole and complete. The chordogram helps to show how harmony plays a supportive role in maintaining the emotional flow.

Chroma similarity

The chroma similarity matrix for “Lost in Dreams” gives a detailed look at how the harmonic content develops over time. The first half of the matrix (0–60s) is full of soft, consistent diagonal patterns. This suggests that the beginning of the track has a steady and repetitive harmonic structure, which fits well with the relaxing and smooth mood of the song. Around the middle (60–90s), things become a bit more scattered. There are fewer clear repetitions, hinting at some subtle changes or a short break in the regular flow.

In the final section (90–120s), diagonal traces reappear but are more irregular, indicating that the song returns to earlier ideas but with small variations. These slight changes add interest without breaking the calm and dreamy atmosphere.

All in all, the matrix shows that “Lost in Dreams” relies on repeating harmonic structures with just enough variation to keep things engaging, which supports its peaceful and flowing character.

Tempo & Energy

Column

Nightfall Future

Tempo & Energy

Tempo is the speed of a track, often measured in beats per minute (BPM), and plays a big role in how music feels to a listener. Fast tempos can feel energetic or urgent, while slower tempos are often more relaxed.

Energy, on the other hand, reflects how intense or dynamic a song feels. It’s influenced by loudness, rhythm, and how the music is played or produced. While tempo gives a basic structure, energy adds variation and emotional character.

In the following analyses, tempo and energy will be used to explore how each track builds its mood and movement.The following graphs visualizes these features.

Energy novelty

The energy novelty graph for Nightfall Future shows a series of quick, sharp spikes at the beginning, suggesting a sudden and lively start with strong rhythmic or instrumental entries. After this early burst, the energy becomes more stable, with only small fluctuations throughout the middle section.

Near the end, there is a noticeable rise in energy again, though not as intense as the intro. This gives the track a sense of return or buildup, adding subtle variation while keeping the mood consistent. Overall, the graph reflects a calm but slightly dynamic structure, with carefully placed shifts to keep the listener engaged.

Tempogram

The tempogram of Nightfall Future shows a very steady and dominant tempo around 132 BPM throughout almost the entire track. This strong horizontal line indicates a consistent pulse, which gives the song a stable rhythmic feel. It helps create a sense of flow and predictability that aligns with the calming atmosphere of the track.

There are a few lighter lines and fluctuations above and below the main tempo, especially toward the end, but they are less prominent. These suggest small rhythmic layers or background elements, but they don’t interfere with the main beat. Overall, the tempo remains locked in, supporting the song’s laid-back and controlled structure.

Column

Lost in Dreams

Energy novelty

The energy novelty curve for Lost in Dreams shows a calm and low-energy structure throughout most of the track, with only a few significant peaks. There are two large spikes—one around 30 seconds and one just before 90 seconds—which suggest sudden changes or drops in the song’s arrangement.

Aside from these moments, the track remains consistent and smooth in its energy profile. This stability reflects a restrained approach to dynamics, helping create a spacious and coherent atmosphere. The occasional spikes add contrast and interest, but the overall impression is one of subtlety rather than intensity.

Tempogram

The tempogram of Lost in Dreams reveals a remarkably steady and consistent tempo across the full duration of the track. A dominant line is clearly visible around 130 BPM, which suggests that the song maintains a regular and unchanging pulse from start to finish. This kind of rhythmic stability often reflects a clear structural backbone, which in turn supports the listener’s sense of timing and engagement.

The presence of additional, fainter parallel bands above and below the main tempo line hints at the use of rhythmic layering or subdivisions in the arrangement. These could be created by percussive patterns, hi-hats, or background rhythmic elements that repeat at half-time or double-time. Such features can add depth to the groove without disrupting the overall pacing of the track.

What stands out most is the lack of dramatic tempo changes, accelerations, or slowdowns. This indicates that the song avoids surprises in rhythm, which helps reinforce its hypnotic and immersive effect. In the context of the broader analysis, this aligns with the track’s smooth energy profile and supports its calming, focused atmosphere. The tempogram thus confirms the role of rhythm as a foundation for the emotional tone of the piece.

Timbre

Column

Nightfall Future

Timbre

Timbre, often referred to as the “color” or “tone quality” of a sound, is what makes one instrument or voice sound different from another, even if they are playing the same pitch at the same loudness. It arises from the unique combination of frequencies produced by a sound source, including the fundamental tone and its overtones or harmonics.

In music analysis, timbre plays an important role in shaping the emotional and stylistic character of a track. It is influenced by factors such as instrumentation, texture, articulation, and the sound envelope. While rhythm, melody, and harmony describe what is played, timbre explains how it sounds.

In the following sections, timbre will be explored using visual tools like Mel-frequency cepstral coefficients (MFCCs), which capture important information about the spectral shape of the audio. These visualizations will help illustrate how timbre varies over time in each track and contributes to their distinct sonic identities.

Cepstogram

The cepstrogram for Nightfall Future provides a visual representation of the timbral content over time, using Mel-frequency cepstral coefficients (MFCCs). Each row in the graph corresponds to a different coefficient, which captures specific spectral characteristics of the audio signal—ranging from the overall spectral shape to finer-grained textural details.

This particular cepstrogram shows a high level of consistency across most coefficients, with only slight fluctuations throughout the piece. This suggests that the timbre remains stable over time, with no dramatic changes in instrumentation, articulation, or sonic texture. The consistent horizontal bands imply that the track has a smooth and controlled sound profile, matching its calm and ambient feel.

The strong definition of the lower coefficients (especially 0 and 1) highlights the dominance of the overall spectral envelope, which shapes the warmth and fullness of the sound. Meanwhile, the lack of high-contrast variation in the upper coefficients points to minimal high-frequency transients or brightness changes, supporting the interpretation of the track as soft and mellow in tone.

Spectral Novelty

The spectral novelty graph for Nightfall Future reveals how the frequency content of the track evolves over time, highlighting points where significant changes occur in the sound spectrum. These changes often reflect musical events such as new instrument entries, rhythmic transitions, or textural shifts.

In this graph, two clear clusters of increased activity stand out: one in the first half (around 30–60 seconds) and another in the second half (around 120–150 seconds). These segments show a higher density of spikes, suggesting active passages with more frequent timbral changes, possibly due to layered synths, rhythmic variation, or transitions.

The sections before and after these clusters are much calmer, with low values that indicate fewer changes in spectral content. This alternation between activity and stillness contributes to a dynamic yet controlled sound structure. The overall shape supports the idea that Nightfall Future relies on steady progression, with occasional bursts of variation to maintain interest while preserving its calm atmosphere.

Timbre similarity

This timbre similarity matrix shows how the sound color—or timbral texture—of Nightfall Future evolves over time. In this case, the matrix reveals a remarkably structured pattern. Regular square-like blocks appear throughout, indicating that similar timbral characteristics repeat at various points. These repeating textures may result from recurring synth layers or consistent sound design choices that give the track its smooth and unified feel.

There is also a high degree of symmetry around the main diagonal, meaning that timbral features change slowly and predictably over time. This aligns with the track’s calming aesthetic, which avoids abrupt shifts or harsh transitions. Overall, the matrix supports the idea that Nightfall Future uses a restrained and repetitive timbral palette to maintain a consistent sonic atmosphere.

Column

Lost in Dreams

Cepstrogram

The cepstrogram for Lost in Dreams gives us a view into how the timbre—essentially the “color” or texture of sound—changes over time. In this plot, we see that the higher coefficients (toward the top) stay relatively consistent, indicating a stable sound texture across the track. The lower coefficients—especially Coefficient 0 and 1—are more active and saturated, which is typical as they carry the most general spectral shape information.

Overall, this timbral consistency supports the track’s dreamy and immersive atmosphere. Instead of dramatic changes in texture, Lost in Dreams maintains a smooth and coherent sonic environment, relying more on layering and subtle modulation than abrupt shifts.

Spectral novelty

The first part of the graph (0–30s) shows a gentle buildup with low to moderate activity, suggesting a gradual introduction of layers. Between 30 and 60 seconds, the novelty rises, indicating more noticeable changes—perhaps a transition or the entry of new instruments. The middle portion of the track maintains a steady level of variation, with consistent but not overwhelming peaks, hinting at a structured and flowing development of musical ideas.

In the final stretch (around 100s onward), the novelty remains moderately active, implying ongoing subtle shifts that keep the sound engaging without becoming chaotic. Overall, the energy of change is well balanced, and the track maintains a smooth yet evolving texture throughout its duration.

Classification

Column

Baseline

To compare the songs in the corpus and determine which characteristics are most useful for classification, a range of musical features is used. These features are either derived from audio analysis or model-based estimations and give insight into how a song sounds and feels. They also help evaluate whether a track might have been generated by AI or composed by a human.

Below is an overview of all the features included in the classification model:

  • Arousal: Captures the intensity or excitement level of the track.
  • Danceability: Estimates how easy it is to dance to the song, based on tempo, rhythm, and beat clarity.
  • Instrumentalness: Indicates the presence or absence of vocals. Higher values suggest fewer or no vocals.
  • Tempo: Represents the pace of the song in beats per minute (BPM).
  • Valence: Measures how positive or happy the music feels. Higher values are associated with cheerful moods.
  • Approachability: Describes how familiar or easy the track is to listen to, especially for casual audiences.
  • Engagingness: Reflects the track’s ability to maintain attention through dynamic shifts and interesting structures.

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI         0.632  0.735
2 Non-AI     0.606  0.488

Feature importance

The confusion matrix helps evaluate how well the model distinguishes between AI-generated and non-AI tracks. The top-left and bottom-right squares show the correct predictions, while the other two indicate misclassifications—false positives (top-right) and false negatives (bottom-left). Although all features are used, the model still performs poorly overall, with low precision and recall scores.

To better understand what drives the predictions, we use a random forest classifier to measure the importance of each feature. This allows us to identify which musical properties contribute most to the model’s decisions and which have less impact.

Instrumentalness stands out as the most influential feature, suggesting that the presence or absence of vocals plays a key role in distinguishing AI-generated music from human-made tracks.

Arousal and danceability follow closely, indicating that the overall energy and rhythmic qualities of a track are also important factors. Valence, which reflects the emotional positivity of the music, ranks next and may help differentiate between expressive human compositions and more neutral AI outputs.

Approachability and engagingness show moderate importance, hinting that listener accessibility and attention retention play supporting but less decisive roles. Tempo is the least informative feature, possibly due to its wide usage across both AI and non-AI music, reducing its discriminatory power.

Column

Adjusted heatmap

The adjusted heatmap presents the performance of the refined classification model. This version excludes the features tempo, engagingness, and approachability, and replaces the original k-nearest neighbour algorithm with a random forest classifier.

The confusion matrix now shows improved accuracy, with the number of false positives for AI (top-right) reduced notably compared to the initial model. This suggests that removing less informative features and switching to a more robust model helped to better distinguish between AI-generated and non-AI tracks. While some misclassifications still occur, overall precision and recall have improved, indicating a more reliable prediction process.

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI         0.75   0.735
2 Non-AI     0.690  0.707

Compared to the original k-nearest neighbor approach, this model achieves higher average precision and recall across both AI and non-AI categories. The most notable improvement is seen in the non-AI class, where recall increases from 0.31 to 0.53 and precision from 0.48 to 0.59. This shows the model is more capable of correctly identifying non-AI tracks.

The detection of AI-generated songs also improves, though to a lesser extent. Overall, this adjustment results in around 6 to 7 additional correct classifications, marking a significant gain in model performance relative to the baseline.

Conclusion

Column

Visualization conclusion

This scatter plot displays the four most important features selected by the random forest model: arousal (x-axis), instrumentalness (y-axis), danceability (bubble size), and valence (color).

There is a visible negative correlation between arousal and instrumentalness: songs with higher arousal levels often have lower instrumentalness scores. In other words, more energetic tracks tend to include vocals more frequently.

Valence also appears to follow this trend, with brighter-colored bubbles (higher valence) clustering in the high-arousal, low-instrumentalness area. Danceability, represented by bubble size, varies more widely but larger bubbles are mostly

Arousal (x-axis) and instrumentalness (y-axis) still show a clear inverse relationship.
The distinction between AI-generated (yellow) and non-AI (purple) tracks becomes more visible, with AI songs tending toward higher arousal and lower instrumentalness.
Danceability (bubble size) does not clearly separate the groups but appears slightly more pronounced in the AI region.
My two songs are now marked in blue, offering a clearer point of comparison within the filtered dataset.

Column

Conclusion

This project has revealed how low-level features—especially arousal, instrumentalness, and spectral novelty—can meaningfully distinguish structural and expressive traits in AI-generated music.
While high-level attributes like tempo or approachability were more ambiguous across the dataset, the lower-level features offered consistent insight into the internal behavior of both tracks: Nightfall Future and Lost in Dreams.

Across the visualizations, Lost in Dreams consistently displayed smoother patterns, stronger harmonic stability, and a more coherent dynamic contour.
From the self-similarity matrices to the novelty curves, the track appeared restrained and controlled, aligning with qualities like low spectral variance and consistent tempo around 130 BPM.
In contrast, Nightfall Future featured more fragmented transitions and pronounced peaks in both energy and spectral novelty, suggesting greater internal contrast and less structural predictability.

These differences emerged despite both tracks being created with the same prompt and process.
What set them apart became clear only through this feature-based analysis, which exposed the subtle ways in which generative systems can produce divergent results depending on randomness, sampling, or emergent behavior.

Rather than simply evaluating whether a track “sounds AI,” this process helped clarify how generative tools behave under the hood—and how those internal dynamics align (or clash) with human musical expectations.
It was a way of turning analysis into reflection: not only on AI music classification, but also on how I approach and evaluate my own creative outputs made with these tools.